Lab 1 Objectives:

Set-up:

0. Attach packages

Reminders:

  • Use library(package_name) to attach an installed package
  • If the package is not found, you need to install it (once) by running install.packages("package_name") in the Console
  • Remember that you have to actually run the code to attach packages for their functions to be available
library(tidyverse)
library(janitor)
library(here)
library(plotly)
library(gghighlight)
library(sf)
library(blogdown)

1. Read in & explore US incarceration data

Data: Prison populations in the United States from The Vera Institute

Reminders:

  • When working in R Markdown, add code in code chunks, which you can create by pressing the green ‘Insert’ button and choosing R, or using the shortcut Command + Shift + M
  • Use here() to navigate to folders not in your top-level working directory (discuss: why is this important?)
us_prison <- read_csv(here("data","incarceration_trends.csv"))

Always look at your data.

Every time.

Every. Single. Time.

Here are some useful functions for data exploration:

  • View() - or alternatively, click on object in ‘Environment’ tab
  • summary()
  • head()

Let’s use them to check out the us_prison object we’ve just stored:

# View(us_prison)
# summary(us_prison)
# head(us_prison)

Familiarize yourself with the data. Note that there are total populations and population breakdowns by sex and race for each county, as well as jail and prison populations for each county by sex and race.

2. Wrangling review 1 - California incarceration

In this review section, we’ll only explore the proportion of imprisoned people who are black California prisons over time.

Reminders:

  • Use the pipe operator (%>%) to link multiple steps in sequence
  • Check the resulting object after every wrangling step
  • Annotate your code!

The steps we’ll use here:

  1. Use dplyr::select() to choose which columns to keep (unnecessary, but to remind ourselves):
    • year
    • state
    • county_name
    • total_prison_pop
    • black_prison_pop
  2. Use dplyr::filter() to only keep observations from California
  3. Use tidyr::drop_na() to remove any rows where the prison populations were not reported
  4. Use dplyr::group_by() + dplyr::summarize() to calculate the totals each year for the entire state
  5. Use dplyr::ungroup() to get rid of any grouping
  6. Use dplyr::mutate() to add a column that is the proportion of imprisoned people who are black each year

Here is what that sequence looks like using the pipe operator:

ca_prison_prop_bl <- us_prison %>% 
  select(year, state, county_name, total_prison_pop, black_prison_pop) %>% 
  filter(state == "CA") %>% 
  drop_na(total_prison_pop, black_prison_pop) %>% 
  group_by(year) %>% 
  summarize(
    tot_pris_pop = sum(total_prison_pop),
    pris_pop_black = sum(black_prison_pop)
  ) %>% 
  ungroup() %>% 
  mutate(prop_black = pris_pop_black / tot_pris_pop)

3. ggplot2 for data visualization

Let’s refresh our data viz skills with ggplot2 by creating a graph of the proportion of imprisoned people in California who are black from 1983 - 2015:

ggplot(data = ca_prison_prop_bl, aes(x = year, y = prop_black)) +
  geom_line() +
  scale_y_continuous(limits = c(0, 0.40)) +
  theme_minimal() +
  labs(x = "year",
       y = "Proportion black (/ California total imprisoned")

4. What if I wanted to do this for all 50 states?

The glory of reproducible code! I can copy the code from above, EXCEPT:

  • Remove filter for state == "CA"
  • When grouping to calculate totals, group by year AND state
us_prison_prop_bl <- us_prison %>% 
  select(year, state, county_name, total_prison_pop, black_prison_pop) %>% 
  drop_na(total_prison_pop, black_prison_pop) %>%
  group_by(year, state) %>% 
  summarize(
    tot_pris_pop = sum(total_prison_pop),
    pris_pop_black = sum(black_prison_pop)
  ) %>% 
  ungroup() %>% 
  mutate(prop_black = pris_pop_black / tot_pris_pop)

5. More data visualization

And let’s make a plot of all 50 (or try to):

ggplot(data = us_prison_prop_bl, aes(x = year, y = prop_black)) +
  geom_line()

Yuck! What’s happening there?

ggplot has no idea that there is a variable for ‘state’ that we’d want to group by. We can do that a number of ways, but one is to change an aesthetic (like line color) based on the grouping variable:

state_graph <- ggplot(data = us_prison_prop_bl, aes(x = year, y = prop_black)) +
  geom_line(aes(color = state)) +
  theme_minimal() +
  labs(x = "Year",
       y = "Proportion of state prisoners who are black")

state_graph

That’s pretty hard to digest (also, whoa). Some other ways we can break things down:

Interactive graphs with plotly:

ggplotly(state_graph)

What if I want to just highlight a single state of interest?

Then I could use gghighlight:

state_graph +
  gghighlight(state == "TX" | state == "CA")
## Warning: You set use_group_by = TRUE, but grouped calculation failed.
## Falling back to ungrouped filter operation...
## label_key: state

6. Let’s make a chloropleth showing those proportions for 2010

First, wrangle object us_prison_prop_bl to just get observations from 2010:

prop_bl_2010 <- us_prison_prop_bl %>% 
  filter(year == 2010)

A ggplot first:

Note: the fct_reorder() here will make them show up in meaningful order, not in the default alphabetical order for character data.

ggplot(data = prop_bl_2010, aes(x = fct_reorder(state, prop_black), y = prop_black)) +
  geom_col(aes(fill = prop_black)) + 
  theme_minimal() +
  labs(x = "State abbreviation",
       y = "Proportion of imprisoned people who are black\n(2010 data only)") +
  coord_flip() 

And that’s what we want to show on a map of the United States.

Get the US states data:

states <- read_sf(dsn = here("data","us_spatial"), layer = "states")

Use plot to look at it quickly:

plot(states)

And see what the sf object actually looks like (hint: it looks like a regular data frame, but geometries are sticky).

View(states)

Aha. Now we want to merge the spatial data with the prison attributes:

prison_spatial <- states %>% 
  left_join(prop_bl_2010, by = c("STATE_ABBR" = "state"))

Then look at prison_spatial - notice that whichever states had data for 2015 now show up with the aligned spatial information!

Finally, let’s make a map of it:

ggplot() +
  geom_sf(data = prison_spatial, 
          aes(fill = prop_black),
          size = 0.2) +
  scale_fill_gradient(low = "yellow", high = "red") +
  theme_minimal()

7. So what’s so cool about sf objects?

They’re so cool because sticky geometries means that you get to wrangle as you would with a normal data frame, but the spatial information is retained!

Example: From prison_spatial, filter to only include CA, OR and WA. Make a chloropleth based on the total prison population (tot_pris_pop) for the three states.

west_coast_prison <- prison_spatial %>% 
  filter(STATE_ABBR %in% c("CA", "OR", "WA"))

ggplot(data = west_coast_prison) +
  geom_sf(aes(fill = tot_pris_pop))

End Part 1


Part 2: Move on to ‘get_your_blog_going.Rmd’!